[python-package] Separately check whether `pyarrow` and `cffi` are installed #6785

mlondschien · 2025-01-12T19:16:50Z

The only tests I can think of would require a runner with pyarrow but not cffi installed.

Note that the LightGBMErrors will only raise when pyarrow is installed, but cffi is not. If pyarrow is not installed, pa_Table is a dummy class and isinstance(data, pa_Table) returns False.

This is a breaking change for users who didn't install lightgbm[arrow], but rather just installed lightgbm and pyarrow separately. Even if not intended, they could previously train a model on a pyarrow.Table, as this was converted via to a scipy.sparse.csr_matrix(data). The fix is simply to install cffi or to transform manually with scipy.sparse.csr_matrix.

Still, it is good to inform people that they are not "natively" training from a pyarrow.Table, incurring an unnecessary copy.

As already suggested in #6782, an alternative would be to raise a warning.

jameslamb

The only tests I can think of would require a runner with pyarrow but not cffi installed.

Could you try adding tests that mock cffi not being available by mocking sys.modules? I tried that in an environment with scikit-learn (another optional dependency of lightgbm) installed and it seemed to work ok:

import sys
from unittest import mock

with mock.patch.dict(sys.modules, {'sklearn': None}):
    import lightgbm as lgb
    print(lgb.compat.SKLEARN_INSTALLED)
    # False

import lightgbm as lgb
print(lgb.compat.SKLEARN_INSTALLED)
# True

We don't have any examples of that in lightgbm's test suite, but I think it'd be interesting to try.

python-package/lightgbm/basic.py

mlondschien · 2025-01-14T16:49:19Z

It appears this does not work. I don't really understand why.

mlondschien · 2025-01-23T13:54:09Z

@jameslamb How would you like to continue here?

jameslamb · 2025-01-24T03:32:23Z

I feel it'd be easy to accidentally undo this work in future refactorings. I will try to find a way to add a test covering this.

jameslamb · 2025-01-26T20:01:06Z

python-package/lightgbm/basic.py

-        if not PYARROW_INSTALLED:
-            raise LightGBMError("Cannot init dataframe from Arrow without `pyarrow` installed.")
+        if not (PYARROW_INSTALLED and CFFI_INSTALLED):
+            raise LightGBMError("Cannot init dataframe from Arrow without `pyarrow` and `cffi` installed.")


Suggested change

raise LightGBMError("Cannot init dataframe from Arrow without `pyarrow` and `cffi` installed.")

raise LightGBMError("Cannot init Dataset from Arrow without `pyarrow` and `cffi` installed.")

This really should be Dataset, not dataframe... I'll make that change when I push testing changes.

Fixed in e72d5e2.

In that commit, I also removed backticks from these log messages, in favor of single quotes. Special characters in log messages can occasionally be problematic.

I know these things were already there before this PR, but might as well fix them right here while we're touching these lines.

python-package/lightgbm/compat.py

jameslamb · 2025-01-26T21:02:11Z

tests/python_package_test/conftest.py

+def missing_module_cffi(monkeypatch):
+    """Mock 'cffi' not being importable"""
+    monkeypatch.setattr(lightgbm.compat, "CFFI_INSTALLED", False)
+    monkeypatch.setattr(lightgbm.basic, "CFFI_INSTALLED", False)


Came up with this based on https://docs.pytest.org/en/stable/reference/reference.html

I'm hoping that this could establish a pattern we re-use in other tests in future PRs.

It's not perfect (for example, if setting CFFI_INSTALLED is done incorrectly in compat.py, then this approach wouldn't catch that), but it's a lightweight and simple way to ensure we always cover code like the changes introduced in this PR.

Referenced https://docs.pytest.org/en/stable/reference/reference.html while working on this.

@jmoralez @borchero @StrikerRUS what do you think about this approach?

This approach looks quite fragile. I think more complicated but right approach would be like one from the following ones: https://stackoverflow.com/a/51048604.

That is a lot more complicated :/

I'll try it though, I do agree that it'd be a stronger test to go all the way into getting the import to literally raise an ImportError.

jameslamb · 2025-01-26T21:03:02Z

tests/python_package_test/test_arrow.py

+            generate_dummy_arrow_table(),
+            label=pa.array([0, 1, 0, 0, 1]),
+            params=dummy_dataset_params(),
+        ).construct()


.construct() is necessary here... __init_from_pyarrow_table() is not run as part of lgb.Dataset().

I pushed changes here, my review shouldn't count towards a merge.

jameslamb · 2025-01-26T21:05:03Z

Ok, think I found a pattern that'll work for this testing! I just pushed e72d5e2, proposing that and adding some other small fixes.

Let me know if it looks ok to you @mlondschien .

I've also dismissed my review... now that I've made such significant edits here, my review shouldn't count towards a merge. @StrikerRUS @borchero @jmoralez could one of you help with a review?

StrikerRUS

Just some minor suggestions below.

python-package/lightgbm/compat.py

StrikerRUS · 2025-01-27T19:27:47Z

python-package/lightgbm/compat.py

+    CFFI_INSTALLED = True
+except ImportError:
+    CFFI_INSTALLED = False
+
    class arrow_cffi:  # type: ignore
        """Dummy class for pyarrow.cffi.ffi."""


Why do we need

CData = None addressof = None cast = None new = None

class members?

CData is needed for type hinting:

LightGBM/python-package/lightgbm/basic.py

Line 416 in 9f1af05

chunks: arrow_cffi.CData

But I think the others could be safely removed. It's only showing up in the diff in this PR because this code is being moved around... so this was missed in earlier PRs (I guess #6034).

Removed all but CData in 7396613

StrikerRUS · 2025-01-27T19:44:18Z

tests/python_package_test/test_arrow.py

+    with pytest.raises(
+        lgb.basic.LightGBMError, match="Cannot predict from Arrow without 'pyarrow' and 'cffi' installed."
+    ):
+        bst = lgb.train(


I think lgb.train() part should be outside of the with block because we expect only predict() should fail.

You are absolutely right, thank you!

Fixed in 7396613

Co-authored-by: Nikita Titov <[email protected]>

…ue-6782

Implement code as suggested by @jameslamb

6b37b34

mlondschien requested review from guolinke, jameslamb, shiyu1994, jmoralez, borchero and StrikerRUS as code owners January 12, 2025 19:16

jameslamb changed the title ~~Separately check whether pyarrow and cffi are installed~~ [pyhton-package] Separately check whether pyarrow and cffi are installed Jan 12, 2025

jameslamb changed the title ~~[pyhton-package] Separately check whether pyarrow and cffi are installed~~ [python-package] Separately check whether pyarrow and cffi are installed Jan 12, 2025

jameslamb added the fix label Jan 12, 2025

jameslamb previously requested changes Jan 13, 2025

View reviewed changes

python-package/lightgbm/basic.py Outdated Show resolved Hide resolved

mlondschien added 2 commits January 14, 2025 17:19

Add test.

512550b

Try pyarrow.cffi

a85027b

mlondschien requested a review from jameslamb January 16, 2025 09:23

jameslamb mentioned this pull request Jan 23, 2025

WIP: release v4.6.0 #6796

Draft

31 tasks

jameslamb added 2 commits January 26, 2025 11:54

Merge branch 'master' of github.com:microsoft/LightGBM into issue-6782

5852ce6

Merge branch 'master' of github.com:microsoft/LightGBM into issue-6782

acd5729

jameslamb reviewed Jan 26, 2025

View reviewed changes

python-package/lightgbm/compat.py Outdated Show resolved Hide resolved

fix compat.arrow_cffi, clarify log messages, fix tests

e72d5e2

jameslamb reviewed Jan 26, 2025

View reviewed changes

keep fixtures in alphabetical order

a4a711d

jameslamb added the awaiting review label Jan 27, 2025

StrikerRUS reviewed Jan 27, 2025

View reviewed changes

jameslamb and others added 4 commits January 27, 2025 13:51

Merge branch 'master' into issue-6782

8992344

Update python-package/lightgbm/compat.py

56127f5

Co-authored-by: Nikita Titov <[email protected]>

update docstring, remove unnecessary class members, re-organize test

7396613

Merge branch 'issue-6782' of github.com:mlondschien/LightGBM into iss…

0526390

…ue-6782

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] Separately check whether `pyarrow` and `cffi` are installed #6785

[python-package] Separately check whether `pyarrow` and `cffi` are installed #6785

mlondschien commented Jan 12, 2025

jameslamb left a comment

mlondschien commented Jan 14, 2025

mlondschien commented Jan 23, 2025

jameslamb commented Jan 24, 2025

jameslamb Jan 26, 2025

jameslamb Jan 26, 2025

jameslamb Jan 26, 2025

StrikerRUS Jan 27, 2025

jameslamb Jan 27, 2025

jameslamb Jan 26, 2025

jameslamb commented Jan 26, 2025

StrikerRUS left a comment

StrikerRUS Jan 27, 2025

jameslamb Jan 27, 2025

jameslamb Jan 27, 2025

StrikerRUS Jan 27, 2025

jameslamb Jan 27, 2025

	raise LightGBMError("Cannot init dataframe from Arrow without `pyarrow` and `cffi` installed.")
	raise LightGBMError("Cannot init Dataset from Arrow without `pyarrow` and `cffi` installed.")

[python-package] Separately check whether pyarrow and cffi are installed #6785

Are you sure you want to change the base?

[python-package] Separately check whether pyarrow and cffi are installed #6785

Conversation

mlondschien commented Jan 12, 2025

jameslamb left a comment

Choose a reason for hiding this comment

mlondschien commented Jan 14, 2025

mlondschien commented Jan 23, 2025

jameslamb commented Jan 24, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jameslamb commented Jan 26, 2025

StrikerRUS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[python-package] Separately check whether `pyarrow` and `cffi` are installed #6785

[python-package] Separately check whether `pyarrow` and `cffi` are installed #6785